Analysis of a data set regarding faculty salaries
## # A tibble: 6 × 17
## FedID UnivName State Tier AvgFu…¹ AvgAs…² AvgAs…³ AvgPr…⁴ AvgFu…⁵ AvgAs…⁶
## <dbl> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1061 Alaska Paci… AK IIB 454 382 362 382 567 485
## 2 1063 Univ.Alaska… AK I 686 560 432 508 914 753
## 3 1065 Univ.Alaska… AK IIA 533 494 329 415 716 663
## 4 11462 Univ.Alaska… AK IIA 612 507 414 498 825 681
## 5 1002 Alabama Agr… AL IIA 442 369 310 350 530 444
## 6 1004 University … AL IIA 441 385 310 388 542 473
## # … with 7 more variables: AvgAssistProfComp <dbl>, AvgProfCompAll <dbl>,
## # NumFullProfs <dbl>, NumAssocProfs <dbl>, NumAssistProfs <dbl>,
## # NumInstructors <dbl>, NumFacultyAll <dbl>, and abbreviated variable names
## # ¹AvgFullProfSalary, ²AvgAssocProfSalary, ³AvgAssistProfSalary,
## # ⁴AvgProfSalaryAll, ⁵AvgFullProfComp, ⁶AvgAssocProfComp
This is not a “tidy” data set, so I can clean it by writing a
function that can be used over and over
|
fed_id
|
univ_name
|
state
|
tier
|
avg_prof_salary_all
|
avg_prof_comp_all
|
num_instructors
|
num_faculty_all
|
rank
|
salary
|
comp_type
|
comp_amt
|
faculty_type
|
faculty_count
|
|
1061
|
Alaska Pacific University
|
AK
|
IIB
|
382
|
487
|
4
|
32
|
full_prof_salary
|
454
|
avg_full_prof_comp
|
567
|
num_full_profs
|
6
|
|
1061
|
Alaska Pacific University
|
AK
|
IIB
|
382
|
487
|
4
|
32
|
full_prof_salary
|
454
|
avg_full_prof_comp
|
567
|
num_assoc_profs
|
11
|
|
1061
|
Alaska Pacific University
|
AK
|
IIB
|
382
|
487
|
4
|
32
|
full_prof_salary
|
454
|
avg_full_prof_comp
|
567
|
num_assist_profs
|
9
|
|
1061
|
Alaska Pacific University
|
AK
|
IIB
|
382
|
487
|
4
|
32
|
full_prof_salary
|
454
|
avg_assoc_prof_comp
|
485
|
num_full_profs
|
6
|
|
1061
|
Alaska Pacific University
|
AK
|
IIB
|
382
|
487
|
4
|
32
|
full_prof_salary
|
454
|
avg_assoc_prof_comp
|
485
|
num_assoc_profs
|
11
|
|
1061
|
Alaska Pacific University
|
AK
|
IIB
|
382
|
487
|
4
|
32
|
full_prof_salary
|
454
|
avg_assoc_prof_comp
|
485
|
num_assist_profs
|
9
|
This is an ANOVA model which is a linear modeling method to evaluate
the relationships between variables. It can rank the variables based on
their impact on the outcome. We can use tools like this to identify
variables to explore in making changes to our experiments, and workflow
or to make predictions for the future.
ANOVA is just one method of modeling. There are countless others
that are readily usable with R studio. It is my job to use the
objectivity of the data software to select the model that best fits each
unique data set.